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Abstract 

Minimizing the rank of a matrix subject to constraints is a challenging problem that arises 
in many applications in control theory, machine learning, and discrete geometry. This class of 
optimization problems, known as rank minimization, is NP-HARD, and for most practical prob- 
lems there are no efficient algorithms that yield exact solutions. A popular heuristic algorithm 
replaces the rank function with the nuclear norm — equal to the sum of the singular values of 
the decision variable. In this paper, we provide a necessary and sufficient condition that quan- 
tifies when this heuristic successfully finds the minimum rank solution of a linear constraint set. 
We additionally provide a probability distribution over instances of the affine rank minimization 
problem such that instances sampled from this distribution satisfy our conditions for success 
with overwhelming probability provided the number of constraints is appropriately large. Fi- 
nally, we give empirical evidence that these probabilistic bounds provide accurate predictions 
of the heuristic's performance in non-asymptotic scenarios. 

AMS (MOC) Subject Classification 90C25; 90C59; 15A52. 
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1 Introduction 

Optimization problems involving constraints on the rank of matrices are pervasive in applications. 
In Control Theory, such problems arise in the context of low-order controller design (9j [19] , minimal 
realization theory and model reduction [3]. In Machine Learning, problems in inference with 
partial information |23| . multi-task learning [T],and manifold learning |28j have been formulated as 
rank minimization problems. Rank minimization also plays a key role in the study of embeddings 
of discrete metric spaces in Euclidean space |16J . In certain instances with special structure, rank 
minimization problems can be solved via the singular value decomposition or can be reduced to the 
solution of a linear system |19[ [2U] . In general, however, minimizing the rank of a matrix subject 
to convex constraints is NP-HARD. The best exact algorithms for this problem involve quantifier 
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elimination and such solution methods require at least exponential time in the dimensions of the 
matrix variables. 

A popular heuristic for solving rank minimization problems in the controls community is the 
"trace heuristic" where one minimizes the trace of a positive semidefinite decision variable instead of 
the rank (see, e.g., [HQS]). ^ g enera hzation of this heuristic to non-symmetric matrices introduced 
by Fazel in [TU] minimizes the nuclear norm, or the sum of the singular values of the matrix, over 
the constraint set. When the matrix variable is symmetric and positive semidefinite, this heuristic 
is equivalent to the trace heuristic, as the trace of a positive semidefinite matrix is equal to the 
sum of its singular values. The nuclear norm is a convex function and can be optimized efficiently 
via semidefinite programming. Both the trace heuristic and the nuclear norm generalization have 
been observed to produce very low-rank solutions in practice, but, until very recently, conditions 
where the heuristic succeeded were only available in cases that could also be solved by elementary 
linear algebra [20] . 

The first non-trivial sufficient conditions that guaranteed the success of the nuclear norm heuris- 
tic were provided in |21j . Focusing on the special case where one seeks the lowest rank matrix in an 
affine subspace, the authors provide a "restricted isometry" condition on the linear map defining 
the affine subspace which guarantees the minimum nuclear norm solution is the minimum rank so- 
lution. Moreover, they provide several ensembles of affine constraints where this sufficient condition 
holds with overwhelming probability. Their work builds on seminal developments in "compressed 
sensing" that determined conditions for when minimizing the i\ norm of a vector over an affine 
space returns the sparsest vector in that space (see, e.g., [6j |5l |3]). There is a strong parallelism 
between the sparse approximation and rank minimization settings. The rank of a diagonal matrix 
is equal to the number of non-zeros on the diagonal. Similarly, the sum of the singular values of a 
diagonal matrix is equal to the £\ norm of the diagonal. Exploiting the parallels, the authors in [21] 
were able to extend much of the analysis developed for the l\ heuristic to provide guarantees for 
the nuclear norm heuristic. 

Building on a different collection of developments in compressed sensing [7J El [25] , we present 
a necessary and sufficient condition for the solution of the nuclear norm heuristic to coincide with 
the minimum rank solution in an affine space. The condition characterizes a particular property 
of the null-space of the linear map which defines the affine space. We show that when the linear 
map defining the constraint set is generated by sampling its entries independently from a Gaus- 
sian distribution, the null-space characterization holds with overwhelming probability provided the 
dimensions of the equality constraints are of appropriate size. We provide numerical experiments 
demonstrating that even when matrix dimensions are small, the nuclear norm heuristic does in- 
deed always recover the minimum rank solution when the number of constraints is sufficiently 
large. Empirically, we observe that our probabilistic bounds accurately predict when the heuristic 
succeeds. 



1.1 Main Results 

Let X be an n\ x n2 matrix decision variable. Without loss of generality, we will assume throughout 
that n\ < U2- Let A : R niX ™ 2 — > R m be a linear map, and let b 6 M m . The main optimization 
problem under study is 

minimize rank(X) , . 

subject to A(X) = b. ^ ' 
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This problem is known to be NP-HARD and is also hard to approximate |18| . As mentioned 
above, a popular heuristic for this problem replaces the rank function with the sum of the singular 
values of the decision variable. Let o~i{X) denote the i-th largest singular value of X (equal to 
the square-root of the i-th largest eigenvalue of XX*). Recall that the rank of X is equal to the 
number of nonzero singular values. In the case when the singular values are all equal to one, the 
sum of the singular values is equal to the rank. When the singular values are less than or equal to 
one, the sum of the singular values is a convex function that is strictly less than the rank. This sum 
of the singular values is a unitarily invariant matrix norm, called the nuclear norm, and is denoted 

r 

11*11* :=XVipQ. 

i=l 

This norm is alternatively known by several other names including the Schatten 1-norm, the Ky 
Fan norm, and the trace class norm. 



As described in the introduction, our main concern is when the optimal solution of ( 1.1 ) coincides 
with the optimal solution of 

minimize ||-X"||* /-, ^ 

subject to A(X) = b. [ ' ' 

This optimization is convex, and can be efficiently solved via a variety of methods including semidef- 
inite programming (see [21] for a survey). 

Whenever m < n\n 2 ^ the null space of A, that is the set of Y such that A(Y) = 0, is not empty. 



Note that X is an optimal solution for ( 1.2 ) if and only if for every Y in the null-space of A 



||X + y||* > (1.3) 

The following theorem generalizes this null-space criterion to a critical property that guarantees 
when the nuclear norm heuristic finds the minimum rank solution of A{X) = b for all values of the 
vector b. Our main result is the following 



Theorem 1.1 Let Xq be the optimal solution of (1.1) and assume that Xq has rank r < n\/2. 
Then 

1. If for every Y in the null space of A and for every decomposition 

Y = Y 1 +Y 2 , 

where Y\ has rank r and Y 2 has rank greater than r, it holds that 

\\Yi\L < \\Y 2 \L, 



then Xq is the unique minimizer of (1.2). 



2. Conversely, if the condition of part 1 does not hold, then there exists a vector b G M m such 
that the minimum rank solution of A{X) = b has rank at most r and is not equal to the 
minimum nuclear norm solution. 
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This result is of interest for multiple reasons. First, as shown in [22], a variety of the rank 
minimization problems, including those with inequality and semidefmite cone constraints, can be 



reformulated in the form of ( 1.1 ). Secondly, we now present a family of random equality constraints 



under which the nuclear norm heuristic succeeds with overwhelming probability. We prove both of 



the following two theorems by showing that A obeys the null-space criteria of Equation (1.3) and 
Theorem 1 1 . 1 1 respectively with overwhelming probability. 

Note that for a linear map A : R niXn2 — > R m , we can always find an m x n\U2 matrix A such 
that 

A(X)=AvecX. (1.4) 

In the case where A has entries sampled independently from a zero-mean, unit-variance Gaussian 
distribution, then the null space characterization of theorem [TT] holds with overwhelming probabil- 
ity provided m is large enough. For simplicity of notation in the theorem statements, we consider 
the case of square matrices. These results can be then translated into rectangular matrices by 
padding with rows/columns of zeros to make the matrix square. We define the random ensemble 
of d\ x c?2 matrices (8(^1,^2) to be the Gaussian ensemble, with each entry sampled i.i.d. from a 
Gaussian distribution with zero-mean and variance one. We also denote &(d, d) by <3(d). 

The first result characterizes when a particular low-rank matrix can be recovered from a random 
linear system via nuclear norm minimization. 

2 

Theorem 1.2 (Weak Bound) Let Xq be an n x n matrix of rank r = (3n. Let A : R nxn — > R^ n 
denote the random linear transformation 

A{X) = Avec(X), 

where A is sampled from (5(/zn 2 , n 2 ). Then whenever 

M>l-^((l-/3) 3/2 -/? 3/2 ) 2 (1.5) 
there exists a numerical constant c w (fi, (3) > such that with probability exceeding 1 — e ~ c ^fi) n2 ; 

X = argmin{||Z||* : A{Z) = A{X )} . 



In particular, if (3 and [i satisfy (1.5), then nuclear norm minimization will recover Xq from a 



random set of fin 2 constraints drawn from the Gaussian ensemble almost surely as n — > 00. 

The second theorem characterizes when the nuclear norm heuristic succeeds at recovering all 
low rank matrices. 



Theorem 1.3 (Strong Bound) Let A be defined as in Theorem 1.2 Define the two functions 

8 (1 - Pf/ 2 - 2 - 4e 
J[l,e> 3n l+4e 

g(f3,e) = ^2/3(2-/3) log 

Then there exists a numerical constant c s (fj,,j3) > such that with probability exceeding 1 
e —c s (p,/3)n ^ j or a n n x n matrices Xq of rank r < j3n 

X = argmin{||Z||* : A{Z) = A(X )} 
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Figure 1: The Weak Bound (1.5 1 versus the Strong Bound (1.6 1 



whenever 



H > 1 



sup 

/03,e)-g03,e)>O 



(me)-g(J3,e)Y 



(1.6) 



In particular, if j3 and [i satisfy (1.5), then nuclear norm minimization will recover all rank r 
matrices from a random set of fin 2 constraints drawn from the Gaussian ensemble almost surely as 
n — > oo. 



Figure [T] plots the bound from Theorems 1.2 and 1.3 We call (1.5) the Weak Bound because 
it is a condition that depends on the optimal solution of (1.1). On the other hand, we call (1.6) 



the Strong Bound as it guarantees the nuclear norm heuristic succeeds no matter what the optimal 
solution. The Weak Bound is the only bound that can be tested experimentally, and, in Section |4j 
we will show that it corresponds well to experimental data. Moreover, the Weak Bound provides 
guaranteed recovery over a far larger region of /i) parameter space. Nonetheless, the mere 
existence of a Strong Bound is surprising in of itself and results in a much better bound than what 
was available from previous results (c.f., [21J). 



1.2 Notation and Preliminaries 



For a rectangular matrix X £ 



, X* denotes the transpose of X. vec(X) denotes the vector 



in W 11712 with the columns of X stacked on top of one and other. 

For vectors v £ M. d , the only norm we will ever consider is the Euclidean norm 



1/2 



u 2 



E 

\i=i 



On the other hand, we will consider a variety of matrix norms. For matrices X and Y of the same 
dimensions, we define the inner product in W llXTl2 as (X, Y) := trace(X*y) = Ya=i YJjti X ij Y ij- 
The norm associated with this inner product is called the Frobenius (or Hilbert-Schmidt) norm 
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|| • \ \f- The Frobenius norm is also equal to the Euclidean, or £2, norm of the vector of singular 
values, i.e., 



1*11* HE*? = V<**HEE*I 



\j=i / 

The operator norm (or induced 2-norm) of a matrix is equal to its largest singular value (i.e., the 
loo norm of the singular values): 

11*11 -=<Tl{X). 

The nuclear norm of a matrix is equal to the sum of its singular values, i.e., 

r 

11*11* :=E^*) • 
i=l 

These three norms are related by the following inequalities which hold for any matrix X of rank at 
most r: 

11*11 < II^IIf < 11*11* < Vr\\X\\ F < r\\X\\. (1.7) 
To any norm, we may associate a dual norm via the following variational definition 

\\X\\ d = sup (Y,X). 

\\y\\ P =i 

One can readily check that the dual norm the Frobenius norm is the Frobenius norm. Less trivially, 
one can show that the dual norm of the operator norm is the nuclear norm (See, for example, [21J). 
We will leverage the duality between the operator and nuclear norm several times in our analysis. 



2 Necessary and Sufficient Conditions 



We first prove our necessary and sufficient condition for success of the nuclear norm heuristic. We 
will need the following two technical lemmas. The first is an easily verified fact. 



Lemma 2.1 Suppose X and Y are m x ri2 matrices such that X*Y 
\\X + Y L = \\X L + y L . 



and XY* 



0. Then 



Indeed, if X*Y = and XY* = 0, we can find a coordinate system in which 



X 



A 




and Y 




B 



from which the lemma trivially follows. The next Lemma allows us to exploit Lemma 2.1 
proof. 



m our 



Lemma 2.2 Let X be an n\ x n<i matrix with rank r < ^ and Y be an arbitrary n\ x n-i matrix. 
Let Px and be the matrices that project onto the column and row spaces of X respectively. Then 
if P^YP r x has full rank, Y can be decomposed as 



Y = Y 1 + Y 2 



where Y\ has rank r, and 



\X + Y 2 \ 



1*1* + II*! 



2 *■ 
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Proof Without loss of generality, we can write X as 






X 



X u 




where Xu is r x r and full rank. Accordingly, Y becomes 

Y U Yi 2 



Y 



Y 2l Y 22 



where Y\i is full rank since PLYPy is. The decomposition is now clearly 



1' 



Y n 



Y 12 



Y 21 Y 21 Y^Y 12 



+ 





Y 22 







Y 21 Y{[ L Y 
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y"i y 2 

That Y\ has rank r follows from the fact that the rank of a block matrix is equal to the rank of 
a diagonal block plus the rank of its Schur complement (see, e.g., |14[ §2.2]). That ||Xi + Y2II* = 



H-Xill* + \\Y 2 II* follows from Lemma 2.1 



We can now provide a proof of Theorem 
Proof We begin by proving the converse. Assume the condition of part 1 is violated, i.e., there 
exists some Y, such that A(Y) = 0, Y = Y\ + Y 2 , rank(l2) > rank(Yi) = r, yet > H^H*- 

Now take Xq = Y\ and b = A(Xq). Clearly, A{—Y 2 ) = b (since Y is in the null space) and so we 
have found a matrix of higher rank, but lower nuclear norm. 

For the other direction, assume the condition of part 1 holds. Now use Lemma 2.2 with X = Xq 
and Y = X* — Xq. That is, let P£ and P% be the matrices that project onto the column and row 
spaces of Xq respectively and assume that P£ (X* —Xq)P^- q has full rank. Write X* — Xq = Y\+Y 2 
where Y\ has rank r and \\Xq + Y 2 \\* = ||Xo||* + H^H*- Assume further that Y 2 has rank larger 
than r (recall r < re/2). We will consider the case where Pj^ (X-. 
and/or Y 2 has rank less than or equal to r in the appendix. We now have: 



Xq)Px does not have full rank 



> 



I Ao + A* — Ao||* 
|A + yi+Y 2 ||* 

IA0 + Y2II*- 11^1 II. 

|Ao||* + I p^l * — \\Y\\ 



by Lemma 2.2 



But A(Yi + Y 2 ) = 0, so ||l2||* — ||Vi||* non-negative and therefore HA^H* > ||Ao||*. Since A* is the 
minimum nuclear norm solution, implies that Aq = A*. ■ 



For the interested reader, the argument for the case where P£ (X* — Ao" 
rank or Y 2 has rank less than or equal to r can be found in the appendix. 



P_Y does not have full 



3 Proofs of the Probabilistic Bounds 



We now turn to the proofs of the probabilistic bounds 1.5 and 1.6 We first provide a sufficient 
condition which implies the necessary and sufficient null-space conditions. Then, noting that the 
null space of A is spanned by Gaussian vectors, we use bounds from probability on Banach Spaces 
to show that the sufficient conditions are met. The will require the introduction of two useful 



auxiliary functions whose actions on Gaussian processes are explored in Section 3.4 
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3.1 Sufficient Condition for Null-space Characterizations 

The following theorem gives us a new condition that implies our necessary and sufficient condition. 

Theorem 3.1 Let A be a linear map of n x n matrices into IR m . Suppose that for every Y in the 
null-space of A and any projection operators onto r -dimensional subspaces P and Q that 

\\(I-P)Y(I-Q)\U>\\PYQ\U. (3.1) 

Then for every matrix Z with row and column spaces equal to the range of Q and P respectively, 

\\Z -\- Y L J> \\Z L 



for all Y in the null-space of A. In particular, if 3.1 holds for every pair of projection operators P 
and Q, then for every Y in the null space of A and for every decomposition Y = Y\ + Y2 where Y\ 
has rank r and Yi has rank greater than r, it holds that 

11*111* < \\Y2W*. 

We will need the following lemma 
Lemma 3.2 For any block partitioned matrix 

X = 



A B 
C D 



we have \\X\\* > \\A\\* + 

Proof This lemma follows from the dual description of the nuclear norm: 
\\X\ 



and similarly 

LA L; -\- D ^ 



sup 



sup 



Z\l Z\2 

Z21 Z22 



A B 
C D 



Z\\ Z\2 

Z21 Z22 



Z n 






Z22 



A B 
C D 



Zu 
Z22 



(3.2) 



(3.3) 



Since (3.2) is a supremum over a larger set that (3.3), the claim follows. 
Theorem |3 . 1 1 now trivially follows 



Proof [of Theorem 3.1 Without loss of generality, we may choose coordinates such that P and 
Q both project onto the space spanned by first r standard basis vectors. Then we may partition Y 

as 

" Y11 Y 12 
Y21 Y 22 



Y 



and write, using Lemma |3.2 
\\Y-Z\L 



\Z L 



Yn-Z Y 12 
Y21 Y 22 



\Z\L > llYii - Z\L + m 



22 * 



\Z\L > \\Y 2 



22 * 



1*1 



1 * 



which is non-negative by assumption. Note that if the theorem holds for all projection operators P 
and Q whose range has dimension r, then \\Z + Y||* > \\Z\\* for all matrices Z of rank r and hence 
the second part of the theorem follows. ■ 
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3.2 Proof of the Weak Bound 



Now we can turn to the proof of Theorem 1.2 The key observation in proving this lemma is the 
following characterization of the null-space of A provided by Stojnic et al [25 

Lemma 3.3 The null space of A is identically distributed to the span of n 2 (l — fj,) matrices G{ 
where each Gi is sampled i.i.d. from <5(n). 

This is nothing more than a statement that the null-space of A is a random subspace. However, 
when we parameterize elements in this subspace as linear combinations of Gaussian vectors, we can 
leverage Comparison Theorems for Gaussian processes to yield our bounds. 

Let M = n 2 (l — /i) and let G±, . . . , Gm be i.i.d. samples from (5(n). Let Xq be a matrix of rank 
(3n. Let Px ( i an d Qx denote the projections onto the column and row spaces of Xq respectively. 
By theorem 



3.1 



and Lemma 



3.3, we need to show that for all v G 



M 




(3.4) 



That is, Yli=i v iGi is an arbitrary element of the null space of A, and this equation restates the 



sufficient condition provided by Theorem 3.1 Now it is clear by homogeneity that we can restrict 
our attention to those v G M. M with norm 1. The following crucial lemma characterizes when the 
expected value of this difference is nonnegative 

Lemma 3.4 Let and r = (3n and suppose P and Q are projection operators onto r-dimensional 
subspaces ofW 1 . For i = 1, . . . , M let Gi be sampled from <&{n). Then 

MIKJ-P) {y^ViG^ (I-Q)||*- \\P [y^ViGi j Ql 



E 



vi=l 



vi=l 



> 



f A + (1) V(l - 0)3/2 _ ^ n 3/2 _ ^ 



(3.5) 



We will prove this Lemma and a similar inequality required for the proof the Strong Bound 
in Section 3.4 below. But we now show how using this Lemma and a concentration of measure 



argument, we prove Theorem |1 .2 1 
First note, that if we plug in M 



(1 — ji)n 2 and divide the right hand side by n 3 / 2 , the right 



hand side of (3.5 ) is non-negative if (1.5 ) holds. To bound the probability that(3.4 ) is non-negative 



we employ a powerful concentration inequality for the Gaussian distribution bounding deviations 
of smoothly varying functions from their expected value. 

To quantify what we mean by smoothly varying, recall that a function / is Lipshitz with respect 
to the Euclidean norm if there exists a constant L such that \f(x) — f{y)\ < L\\x — y\\i 2 for all x 
and y. The smallest such constant L is called the Lipshitz constant of the map /. If / is Lipshitz, 
it cannot vary too rapidly. In particular, note that if / is differentiable and Lipshitz, then L is 
a bound on the norm of the gradient of /. The following theorem states that the deviations of a 
Lipshitz function applied to a Gaussian random variable have Gaussian tails. 

Theorem 3.5 Let x be a normally distributed random vector and let f be a function with Lipshitz 
constant L. Then 



P[|/(x)-E[/(ar)]| >t] <2exp 



2L 2 J 
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See |15| for a proof of this theorem with slightly weaker constants and several references for more 
complicated proofs that give rise to this concentration inequality. The following Lemma bounds 
the Lipshitz constant of interest 

Lemma 3.6 For i = 1, . . . , M, let X { G R nixni and G R™2xn 2 _ j) e fi ne ^ e function 

M M 

F I (X 1 ,...,X M ,Yi,...,Y M )= inf || V^Z^I* - || VV>i||* . 



i=l 



i=l 



Then the Lipshitz constant of Fj is at most ^Jn\ + n<i- 



The proof of this lemma is straightforward and can be found in the appendix. Using Theorem 3.5 



and Lemmas 3.4 and 3.6, we can now bound 



^inf =i ||(/-P Xo ) \J2 v i G ij ( J -Q*o)ll* " H P ^o (j2 V * G ^J ^ x o\\* < ^ 



< 



exp 



i(^((i-/5) 3/2 -/? 3/2 )-yr 



(3.6) 



li - t) n 2 + o(n 2 ) . 



Setting t = completes the proof of Theorem 1.2 We will use this concentration inequality with 
a non-zero t to prove the Strong Bound. 

3.3 Proof of the Strong Bound 



The proof of the Strong Bound is similar to that of the Weak Bound except we prove that (3.4) 



holds for all operators P and Q that project onto r-dimensional subspaces. Our proof will require 
an e-net for the projection operators — a set of points such that any projection operator is within e 



of some element in the set. We will show that if a slightly stronger bound that (3.4) holds on the 



e-net, then (3.4) holds for all choices of row and column spaces. 



Let us first examine how (3.4) changes when we perturb P and Q. Let P, Q, P' and Q' all be 



projection operators onto r-dimensional subspaces. Let W be some n x n matrix and observe that 

||(/ - P')W(I - Q')\U - \\P'WQ'\U - - P)W(I - Q)\U - \\PWQ\U) 
<\\(I - P)W{I -Q)-{I- P')W{I - Q')\U + \\PWQ - P'WQ% 
<||(I- P)W{I - Q) - (J - P')W(I - Q)\U + - P')W(I -Q)-(I- P')W(I - Q')\U 

+ \\PWQ - P'WQIU + \\P'WQ - P'WQ'\U 
<\\P - P'HIIWWII - Q\\ + ||7 - P'lHI^IUIQ - Q'\\ + ||P - P'HIIWIUIIQII + II^IHIWII.IIQ - Q'll 
<2(||P-P / || + ||Q-0 / ||)||W||,. 

Here, the first and second lines follow from the triangle inequality, the third line follows because 
||AB||* < \\A\\ ||-B||*, and the fourth line follows because P, P', Q, and Q' are all projection operators. 
Rearranging this inequality gives 



(I-P')W(I-Q')\U-\\P'WQ'\U > ||(I-P)W(I-Q)|U-||PWQ||*-2(||P-P / || + ||Q-Q / ||)||W||* 
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As we have just discussed, if we can prove that with overwhelming probability 

- P)W(I - Q)\U - \\PWQ\U - Ae\\W\\, > 



(3.7) 



for all P and Q in an e-net for the projection operators onto r-dimensional subspaces, we will have 
proved the Strong Bound. 

To proceed, we need to know the size of an e-net. The following bound on such a net is due to 
Szarek. 

Theorem 3.7 (Szarek [27J) Consider the space of all projection operators on M. n projecting onto 
r dimensional subspaces endowed with the metric 

d(P,P') = \\P-P'\\ 

Then there exists an e-net in this metric space with cardinality at most (|r) ^ 1 ■ 

With this in hand, we now calculate the probability that for a given P and Q in the e-net, 

M 



inf \\(I-P)(y / v i GA(I-Q)\U-\\p(y / v i Gi]Q\U>Ae sup 



i=l 



(3.8) 



As we will show in Section 3.4 we can upper bound the right hand side of this inequality using 
a similar bound as in Lemma 13.41 

Lemma 3.8 For i = 1, . . . , M let Gi be sampled from 0(n). Then 



E 



M 



sup || y~]viGj 



Me 2 =1 i=l 



I — +o(l) n 3/2 + VMn. 

37T 



(3.9) 



Moreover, we prove the following in the appendix. 

Lemma 3.9 For i = 1, . . . , M , let X{ G M nxn and define the function 

M 

F s (Xi, . . . ,X M ) = sup 

IMU a =i 1^1 

Then the Lipshitz constant of F$ is at most ^fn. 



Using Lemmas 3.8 and 3.9 combined with Theorem 3.5 we have that 

M 



4e sup || y^VjGjW* > tn 3/2 



't» \\f„ =i 



< 



exp 



i 



3vr 



V^M + o(l) 



4e 



n 



(3.10) 
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and if we set the exponents of (3.6) and (3.10) equal to each other and solve for t, we find after 



some algebra and the union bound 
/ M \ 



M 



M 



>1 



inf || (J -P) \J2viGi )(I-Q)\U-\\P [Y^v.Gi )Q\U > 4e sup || V 

/ M \ / M \ M 

inf || (J -P) (y;«iG< (J-Q)|U-||P VsjGj Q||* > in 3 / 2 > 4e sup || V 
|nf =i ||(7 - P) (X> G *J ^-Q)\\* ~ W P \J^ ViG *) < ^ 



ViGi 



M 



4e sup || y Vi ^11* > tn 3/2 



3tt 



1 +4e 



Now, let O be an e-net for the set of projection operators discussed above. Again by the union 
bound, we have that 



VP,Q i|t mf =i ||(/-P) (j2 ViG *) ( J -Q)H*- H P fe^J - 4e , 



sup 



M 



i=l 



< 1 - 2exp | - < 5 ( I _ ^ + m _ /3) log (|) U + «„). 

(3.11) 

Finding the parameters fj,, (3, and e that make the terms multiplying n 2 negative completes the 
proof of the Strong Bound. 

3.4 Comparison Theorems for Gaussian Processes and the Proofs of Lem- 
mas E3 and I3T81 

Both of the two following Comparison Theorems provide sufficient conditions for when the expected 
supremum or infimum of one Gaussian process is greater to that of another. Elementary proofs of 
both of these Theorems and several other Comparison Theorems can be found in §3.3 of [15 a . 

Theorem 3.10 (Slepian's Lemma |24j ) Let X and Y by Gaussian random variables in M. N 
such that 

jElXiXj} < ^[YiYj] for all i / j 
\E[X?\ =E[Y;] 2 for alii 



Then 



E[maxFi] < EfmaxX;] . 
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Theorem 3.11 (Gordan |12[ |13j ) Let X = (Xij) and Y = (Yij) be Gaussian random vectors in 

R 7Vix7V 2 such fhat 

'ElXijXik] < E[Y tJ Y ik ] for all i, j, k 
HXijXuc] > E[Y tj Y lk ] for all i ^ I and j, k 



Then 



E[X 2 .]=E[X 2 ] forallj,k 



E[min max Y{j] < EfminmaxXj 



The following two lemmas follow from applications of these Comparison Theorems. We prove 
them in more generality than necessary for the current work because both Lemmas are interesting 
in their own right. Let || • || p be any norm on D x D matrices and let || • ||^ be its associated dual 
norm (See Section 1.2). Let us define the quantity <t(||G|| p ) as 

<r(\\G\\ P ) 



sup H^Hf- 

\Z\\d=l 



(3.12) 



and note that by this definition, we have 



a(\\G\\ p )= sup E[(G,Z) 

\\Z\\d=l 



21 1/2 



motivating the notation. 

This first Lemma is now a straightforward consequence of Slepian's Lemma 

Lemma 3.12 Let A > and let g be a Gaussian random vector in ~R M . Let G,Gi, 
sampled i.i.d. from &(D). Then 



,G M be 



E 



/ M \ 

sup sup A (g, v) + ( V] ViGi, Y 
\\v\U 2 =l\\Y\\ d =i \ l=1 i 



<E[||G|U + JM(A* + o-(\\G\\ p y 



Proof 

We follow the strategy used prove Theorem 3.20 in [IS]. Let G,G\, . . . , Gm be sampled i.i.d. 
from (5(.D) and g G M M be a Gaussian random vector and let 7 be a zero-mean, unit- variance 
Gaussian random variable. For v G M M and Y G R DxD define 



M 



Q L (v, Y) = A(g, v) + (J2 v^, Y ) + a(||G|| p ) 7 



Q R (v, Y) = (G, Y) + y/ A2 + a(||G|| p ) 2 (<7, v) . 

Now observe that for any unit vectors in M. M v, v and any D x D matrices Y, Y with dual norm 1 

E[Q L (v, Y)Q L (v, Y)] - E[Q R (v, Y)Q r (v, Y)] 
=A\v,v) + {v,v)(Y,Y)+a(\\G\\ p ) 2 - (Y, Y) - (A 2 + a(\\G\\ p f)(v,v) 
=(a(\\G\\ p ) 2 -(Y,Y))(l-(v,v)). 
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The difference in expectation is thus equal to zero if v = v and is greater than or equal to zero 



if v 7^ v. Hence, by Slepian's Lemma and a compactness argument (see Proposition A.l in the 
Appendix) , 



E 



sup sup Ql(v,Y) 



Me 2 =l 11*11=1 



< E 



sup sup Qr(v, Y) 



|M|f 2 =l||Y||=l 



which proves the Lemma. ■ 
The following lemma can be proved in a similar fashion 

Lemma 3.13 Let \\ ■ \\ p be a norm on ~K DlxDl with dual norm || • ||^ and let \\ ■ \\b be a norm on 
^D 2 xr> 2 _ £et g be a Gaussian random vector in M. M . Let Go, G±, . . . , Gm be sampled i.i.d. from 
&{D\) and G'i, . . . , G' M be sampled i.i.d. from 0(Z?2)- Then 



E 



M 



M 



inf inf sup ( } ViGi, Z ) + ( y viG';, Y 
HI*- 1 " y "* =1 M^i \U \U i 



M 



sup sup a{\\G\\ p )(g,v) + (J2 v i G 'ii Y 



>E[||Go|| P ] -E 
Proof Define the functionals 

/ M 

P L (v,Y,Z) = rpViGi,z\ + (J^ViC^Y j + 7 o-(\\G \\ P ) 
P R (v, Y, Z) = (Go, Z) + a{\\Go\\ P )(g, v) + ( J>(4 y) . 



Ht 2 =1 \\ Y \\b=^ 



1=1 



Let v and v be unit vectors in M. M , Y and Y be D2 X D2 matrices with ||y||6 = \\Y\\b = 1, and Z 
and Z be D\ x D\ matrices with \\Z\\d = \\Z\\d = 1- Then we have 

E[P L (v, Y, Z)P L {v, Y, Z)] - E[P R (v, Y, Z)P L (v, Y, Z)] 
=(v,v)(Z, Z) + (v,v)(Y,Y) + a(\\Go\\ P ) 2 - (Z,Z) - a(\\G Q \\ p ) 2 (v, v) - (v,v)(Y,Y) 
=(o-(\\G \\ P ) 2 -(Z,Z))(l-{v,v)). 

The difference in expectations is greater than or equal to zero and equal to zero when v = v and 
Y = Y. Hence, by Gordan's Lemma and a compactness argument, 



E 



inf inf sup Ql(v,Y,Z) 
HU 2 =l||y||{,=l||z|| d =i 



> E 



inf inf sup Qr(v, Y, Z) 
MU 2 =i||Y|U=i|| Z | U=1 



completing the proof. 



Together with Lemmas 3.12 and 3.13, we can prove the Lemma 3.4 
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Proof [of Lemma [3~4] For i = 1, . . . , M, let d £ ©((1 - (3)n) and G\ £ <8(j3n). Then 

M 



E 



inf 



i=l 



M 



E 



i=l 
/ M 



M 



inf inf sup ( > v;Gi,Z > + ( > -UjG' 
>E[||G ||*]-E 



sup sup (7(||G||*)(5,«) + ( VVG^y 
lklU 2 =i ll^ll=i v 



it=i 



> E [||G ||*] - E [IICqIU] - V^V^dlCII*) 2 + ^(I|G' / |U) 2 



where the first inequality follows from Lemma 3.13 and the second inequality follows from Lemma [3. 12 

Now we only need to plug in the expected values of the nuclear norm and the quantity cr(||G||*). 
Let G be sampled from &(D). Then 



E||G||* = DEcrj = — D 3/2 + q{D) 



(3.13) 



where q(D)/ D 3 / 2 = o(l). The constant in from of the L> 3//2 comes from integrating y/\ against the 
Marcenko-Pastur distribution (see, e.g., [171 [2]): 



1 f 4 

— / V^tdt 
In Jo 



3tt 



0.85. 



Secondly, a straightforward calculation reveals 



a(||G||*) = sup \\G\\ F = y/D. 

\\H\\<1 

Plugging these values in with the appropriate dimensions completes the proof. 



Proof [of Lemma 3.8 This lemma immediately follows from applying Lemma 3.12 with A = 
and from the calculations at the end of the proof above. It is also an immediate consequence of 
Lemma 3.21 from [15]. ■ 



4 Numerical Experiments 

We now show that these asymptotic estimates hold even for small values of n. We conducted a 
series of experiments for a variety of the matrix sizes n, ranks r, and numbers of measurements 
m. As in the previous section, we let j3 = ^ and /i = ™. For a fixed n, we constructed random 
recovery scenarios for low-rank nx n matrices. For each n, we varied fi between and 1 where the 
matrix is completely determined. For a fixed n and fi, we generated all possible ranks such that 
f3(2 — /?)<//. This cutoff was chosen because beyond that point there would be an infinite set of 
matrices of rank r satisfying the m equations. 

For each (n, n, f3) triple, we repeated the following procedure 10 times. A matrix of rank r was 
generated by choosing two random n x r factors Yl and Yr with i.i.d. random entries and setting 
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(a) (b) 

Figure 2: Random rank recovery experiments for (a) n = 30 and (b) n = 40. The color of each cell reflects 
the empirical recovery rate. White denotes perfect recovery in all experiments, and black denotes failure 



for all experiments. In both frames, we plot the Weak Bound (1.5 1, showing that the predicted recovery 
regions are contained within the empirical regions, and the boundary between success and failure is well 
approximated for large values of (3. 



Yq = Yi,Yft. A matrix A was sampled from the Gaussian ensemble with m rows and n 2 columns. 
Then the nuclear norm minimization 

minimize ||A||* 

subject to A vec X = A vec Yq 

was solved using the freely available software SeDuMi [26] using the semidefmite programming 
formulation described in [21]. On a 2.0 GHz Laptop, each semidefmite program could be solved in 
less than two minutes for 40 x 40 dimensional X. We declared Yq to be recovered if 

||x-y ||F/ll>ollF<io- 3 . 

Figure [2] displays the results of these experiments for n = 30 and 40. The color of the cell in the 
figures reflects the empirical recovery rate of the 10 runs (scaled between and 1). White denotes 
perfect recovery in all experiments, and black denotes failure for all experiments. It is remarkable 
to note that not only are the plots very similar for n = 30 and n = 40, but that the Weak Bound 
falls completely within the white region and is an excellent approximation of the boundary between 
success and failure for large (3. 
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A Appendix 

A.l Rank-deficient case of Theorem II. ll 

As promised above, here is the completion of the proof of Theorem |1.1| 
Proof In an appropriate basis, we may write 



Xu 




and X* - X = Y 



ill Y 12 
Y 2 i Y22 



If Y\\ and Y22 — Yi\Y^~Y\i have full rank, then all our previous arguments apply. Thus, assume 
that at least one of them is not full rank. Nonetheless, it is always possible to find an arbitrarily 
small e > such that 

Yll + el and 

L Y 2 i Y22 + el 

are full rank. This, of course, is equivalent to having Y22 + el — Y 2 i(Y\i + eI)~ l Y\2 full rank. We 
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can write 



\XJL = \\X + X m -X \ 



> 



X n Y u Y12 

J + [ Y 21 Y 22 

In -el 

Y 22 -Y 21 (Y 11 + eI)- 1 Y 12 



Y 2 2-Y 21 (Y u + eI)- 1 Y 12 

el -el 

Y 22 -Y 21 (Y 11 + eI)- 1 Y 12 

-el 
Y 22 -Y 2l {Y xl + eI)- l Y l2 



= \\X xl -eI\U- 
> ||Xo|U -re + 



Y u + el Y 12 

Y 21 Y^Yu + el)-^ 

Yu + el Y\ 2 

Y 21 Y 21 {Y n + d)- l Y u 

Y u + el Y 12 

Y 2 i Y 21 (Y n + e/)" 1 ^ 

Y u + el Y 12 

Y 21 Y 21 (Y 11 + d)- 1 Y 12 



>||X |U-2re - 
> \\X \U-2re, 

where the last inequality follows from the condition of part 1 and noting that 



-el 







Y 22 -Y 21 (Y ll + eI)- 1 Y } 



12 



+ 



Y u + el 



Y 21 Y 21 (Y 11 + eI)- 1 Yi 



12 



lies in the null space of A(-) and the first matrix above has rank more than r. But, since e can be 
arbitrarily small, this implies that Xq = X*. ■ 



A. 2 Lipshitz Constants of Fj and Fs 



We begin with the proof of Lemma 3.9 and then use this to estimate the Lipshitz constant in 
Lemma 13.61 



Proof [of Lemma 3.9 Note that the function Fs is convex as we can write as a supremum of a 
collection of convex functions 



M 



F s (Xi, X M ) = sup sup (V, ViXi, Z) 

\\v\\ l2 =l\\Z\\<l i=1 



(A.l) 



The Lipshitz constant L is bounded above by the maximal norm of a subgradient of this convex 
function. That is, if we denote X := (X%, . . . , Xm), then we have 



/ M \ 

L < sup _ sup _ \\ZiWp 
x zedF s (x) \i=i / 



1/2 



Now, by (A.l), a subgradient of Fs at X is given of the form (v\Z,v 2 Z, . . . ,vmZ) where v has 
norm 1 and Z has operator norm 1. For any such subgradient 

M 



\Zf F < n 



i=l 



bounding the Lipshitz constant as desired. 
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Proof [of Lemma Q For % = 1, . . . , M, let X t , X £ R n ^ xn \ and Y h % £ R™2xn 2 _ Let 

Af Af 

ii X/^* *n* ~ ii y~! w ^n* • 



to = arg mm 

x 4 = 1 



8=1 



Then we have that 

Fi(X\ , . . . , Xm, Yi, . . . , Ym) — Fi(Xi , . . . , Xm, Yi,..., Ym) 
/ M M \ / Af Af \ 

= inf || V'wiXiH* - || VV^H* I - inf || y~]wiXi\\* - || V^u^H* ] 

M M M M 

<||^<X||* - || J^u;**i||* - ||^<X||* + || J^<£||« 

8=1 i=l i=l 

Af Af 

<|| ^ - xoil* + 1| w i(Xi - 



i=l 



i=l f=l 

Af Af 

< sup || ^WiiXi-Xi^ + W^WiiYi-Yi 
IIHU 2 =i 



Af Af 

,,,!*= sup ii wjXiii* + ii y]wjYi\ 

\H\e 2 =l i=1 



i=l 11^11/2— 1 8=1 8=1 

where Xj = Xj — Xj and Yi = Yi — Yi. This last expression is a convex function of Xi and 5^, 

Af Af Af Af 

sup || 'y^WjXjW* + || y^iOj^H* = sup sup sup 
\H\e 2 =l i=1 i=1 |H| £2 =l||Z x ||<l||Zy||<l i=1 i=l 

with Zx ni x rii and Zy n 2 x n 2 . Using an identical argument as the one presented in the proof 
of Lemma 3.9, we have that a subgradient of this expression is of the form 

(wiZx,w 2 Zx, ■ ■ ■ ,wmZx,wiZy,w 2 Zy, ■ ■ .,w m Zy) 
where w has norm 1 and Zx and Zy have operator norms 1, and thus 



M 



HlUiZxIli? + ||w;Zy||p = ||^x||f + ||-£jH|! < n l + 



112 



i=l 



completing the proof. 



A. 3 Compactness Argument for Comparison Theorems 

Proposition A.l Let 0, be a compact metric space with distance function p. Suppose that f and 
g are real-valued function on such that f is continuous and for any finite subset X C £1 

max fix) < max qix) . 

Then 

sup/(x) < sup<?(x) . 
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Proof Let e > 0. Since / is continuous and 0, is compact, / is uniformly continuous on f2. That 
is, there exists a 5 > such that for all x, y £ £1, p(x, y) < 5 implies \ f(x) — f(y)\ < e. Let X$ be a 
5-net for 0,. Then, for any x 6 fi, there is a y in the 5-net with y) < 5 and hence 

f(x) < f(y) + e < sup f(z) + e < sup ^(z) + e < sup g(z) + e . 

zeXj zex s zen 

Since this holds for all x G O and e > 0, this completes the proof. ■ 
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